Can Spanish Be Simpler? LexSiS: Lexical Simplification for Spanish

نویسندگان

  • Stefan Bott
  • Luz Rello
  • Biljana Drndarevic
  • Horacio Saggion
چکیده

Lexical simplification is the task of replacing a word in a given context by an easier-to-understand synonym. Although a number of lexical simplification approaches have been developed in recent years, most of them have been applied to English, with recent work taking advantage of parallel monolingual datasets for training. Here we present LexSiS, a lexical simplification system for Spanish that does not require a parallel corpus, but instead relies on freely available resources, such as an on-line dictionary and the Web as a corpus. LexSiS uses three techniques for finding a suitable word substitute: a word vector model, word frequency, and word length. In experiments with human informants, we have verified that LexSiS performs better than a hard-to-beat baseline based on synonym frequency. TITLE AND ABSTRACT IN SPANISH ¿Puede ser el Español más simple? LexSiS: Simplificación Léxica en Español La tarea de simplificación léxica consiste en sustituir una palabra en un contexto determinado por un sinónimo que sea más sencillo de comprender. Aunque en los últimos años han aparecido algunos sistemas para desempeñar esta tarea, la mayoría de ellos se han desarrollado para el inglés y hacen uso de corpus paralelos. En este artículo presentamos LexSiS, un sistema de simplificación léxica en español que utiliza recursos libremente disponibles tales como un diccionario en línea o la Web como corpus, sin la necesidad de acudir a la creación de corpus paralelos. LexSiS utiliza tres técnicas para encontrar un sustituto léxico más simple: un modelo vectorial basado en palabras, la frecuencia de las palabras y la longitud de la palabras. Una evaluación realizada con tres anotadores demuestra que para algunos conjuntos de datos LexSiS propone sinónimos más simples que el sinónimo más frecuente.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

CASSAurus: A Resource of Simpler Spanish Synonyms

In this work we introduce and describe a language resource composed of lists of simpler synonyms for Spanish. The synonyms are divided in different senses taken from the Spanish OpenThesaurus, where context disambiguation was performed by using statistical information from the Web and Google Books Ngrams. This resource is freely available online and can be used for different NLP tasks such as l...

متن کامل

Can Numerical Expressions Be Simpler? Implementation and Demostration of a Numerical Simplification System for Spanish

Information in newspapers is often showed in the form of numerical expressions which present comprehension problems for many people, including people with disabilities, illiteracy or lack of access to advanced technology. The purpose of this paper is to motivate, describe, and demonstrate a rule-based lexical component that simplifies numerical expressions in Spanish texts. We propose an approa...

متن کامل

Comparing Resources for Spanish Lexical Simplification

In this paper we study the effect of different lexical resources and strategies for selecting synonyms in a lexical simplification system for the Spanish language. The resources used for the experiments are the Spanish EuroWordNet, the Spanish Open Thesaurus and a combination of both. As for the synonym selection strategies, we have used both local and global contexts for word sense disambiguat...

متن کامل

CASSA: A Context-Aware Synonym Simplification Algorithm

We present a new context-aware method for lexical simplification that uses two free language resources and real web frequencies. We compare it with the state-of-the-art method for lexical simplification in Spanish and the established simplification baseline, that is, the most frequent synonym. Our method improves upon the other methods in the detection of complex words, in meaning preservation,...

متن کامل

Towards Automatic Lexical Simplification in Spanish: An Empirical Study

In this paper we present the results of the analysis of a parallel corpus of original and simplified texts in Spanish, gathered for the purpose of developing an automatic simplification system for this language. The system is intended for individuals with cognitive disabilities who experience difficulties reading and interpreting informative texts. We here concentrate on lexical simplification ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012